Learning the Taxonomy of Function Words for Parsing

نویسندگان

  • Dongchen Li
  • Xiantao Zhang
  • Dingsheng Luo
  • Xihong Wu
چکیده

Completely data-driven grammar training is prone to over-fitting. Human-defined word class knowledge is useful to address this issue. However, the manual word class taxonomy may be unreliable and irrational for statistical natural language processing, aside from its insufficient linguistic phenomena coverage and domain adaptivity. In this paper, a formalized representation of function word subcategorization is developed for parsing in an automatic manner. The function word classification representing intrinsic features of syntactic usages is used to supervise the grammar induction, and the structure of the taxonomy is learned simultaneously. The grammar learning process is no longer a unilaterally supervised training by hierarchical knowledge, but an interactive process between the knowledge structure learning and the grammar training. The established taxonomy implies the stochastic significance of the diversified syntactic features. The experiments on both Penn Chinese Treebank and Tsinghua Treebank show that the proposed method improves parsing performance by 1.6% and 7.6% respectively over the baseline.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

On the Representation of Bloom's Revised Taxonomy in Interchange Coursebooks

This study intends to evaluate Interchange series (2005), which are still fundamental coursebooks in the EFL curriculum settings, in terms of learning objectives in Bloom’s Revised Taxonomy (2001) to see which levels of Bloom's Revised Taxonomy were more emphasized in these coursebooks. For this purpose, the contents of Interchange textbooks were codified based on a coding scheme designed by th...

متن کامل

Cubic-time Parsing and Learning Algorithms for Grammatical Bigram Models

This technical report presents a probabilistic model of English grammar that is based upon “grammatical bigrams”, i.e., syntactic relationships between pairs of words. Because of its simplicity, the grammatical bigram model admits cubic-time parsing and unsupervised learning algorithms, which are described in detail.

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

On the Applicability of Oxford's Taxonomy of Learner Strategies to Translation Tasks

During the last three decades, especially 1980's, language learning specialists have been busy  discovering the nature of language learning strategies, describing them, and formulating their relationships with other language learning factors. In line with these studies, the field of translation studies has undergone a complete revolution in terms of its perspective toward its research prioritie...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014